FIQProf/profanal - FIQ-based CPU profiler for RISC OS 5 HAL machines
----------------


What they do
------------

FIQProf is a BASIC program that configures a HAL timer to generate FIQs, and then installs a small routine on the FIQ vector which samples a selection of CPU registers and performance counters when the timer interrupts. This data is stored into a buffer which can then later be saved to disc.

profanal is a single-tasking C program designed to analyse the log buffer generated by FIQProf. It can produce human-readable timelines and graphs of CPU activity and performance counter values. From the module list that is generated by FIQProf it is able to identify what module the CPU is in. It also supports the loading of GPA-format map/symbol files as produced by Acorn/Castle/ROOL C/C++, allowing the source file, function & line to be identified. Absolute files containing embedded function names can also be loaded.


How to use FIQProf
------------------

FIQProf should be run on the target machine. Currently only Iyonix and OMAP3 targets are supported. Support for other machines can be added, providing the HAL allows for timers to be configured to generate FIQs.

FIQProf relies on several configuration values. The default configuration values can be overriden by setting system variables (e.g. '*set FIQProf$TimerNumber 1') or by passing the value as a parameter on the command line (e.g. '-TimerNumber=1'). The command line parameters take precedence over the system variables, and the system variables take precedence over the hardcoded defaults. Additionally, whenever FIQProf is run, the FIQProf$* system variables will be updated with the new configuration values, so that you don't have to retype command line options all the time. When FIQProf is run it will list the current values of the configuration options.

Option       Default   Description
------       -------   -----------
TimerNumber  1         HAL timer to use. Timer 0 is prohibited, as it is used
                       by the OS for the system timer. FIQProf will warn you
                       if an invalid timer has been selected.
TimerFreq    25000     Frequency of timer interrupt, in Hz. Values of up to
                       400kHz have been used successfully on an Iyonix. Higher
                       values may cause instability or cause the machine to
                       become unresponsive until the buffer fills.
DASize       16        Size of dynamic area to create, in megabytes. If a
                       dynamic area of the configured size cannot be created
                       (e.g. if the WIMP isn't running, and all the of
                       application slot is be being used by BASIC) then FIQProf
                       will attempt to allocate from the RMA instead.
DAName       FIQProf   Name of dynamic area. When FIQProf starts it will search
                       for any existing dynamic areas with an identical name
                       and remove them.
Registers    &18000    Hex mask of registers to sample. Bits 0-15 map to
                       R0-R15, and bit 16 maps to the PSR. It is possible to
                       perform profiling without R15 being sampled, but the
                       resultant log file will be mostly useless with profanal.
PerfCounters (none)    Comma-seperated list of performance counters to enable.
                       If first entry is 'C', the cycle counter will be enabled.
                       Other entries must be the CPU-specific performance
                       counter numers, as listed in the IOP321 developer's
                       manual & Cortex-A8 TRM. Currently the IOP and Cortex-A8
                       are the only CPUs for which performance counter sampling
                       will work. Also note that numbers must be decimal, not
                       hex.


If FIQProf was given valid parameters and was able to allocate the required buffer, it will configure the desired timer as best it can, and insert a routine into the FIQ vector space (&1C-&FC) to perform the sampling. Any existing FIQ claimant (there usually isn't any) will be overwritten. Once the buffer fills the interrupt handler will automatically disable the interrupt.

In order to get the status of the progress, you can use *Memory or BASIC's ! operator to examine the address space occupied by the FIQ handler. When run FIQProf will tell you the location of the write ptr. If write_ptr!0 matches write_ptr!4 then the buffer is full and the FIQ handler has been disabled.

In order to save the buffer contents to disc, you can use the FIQProfSave command. This is just implemented as an alias to *Save, so you don't have to keep track of the buffer start & length yourself. Although you can save the buffer while it's still filling, be aware that profanal currently doesn't filter out the 'invalid' entries that the buffer is filled with on creation.

As long as you haven't changed the timer number, FIQProf should behave correctly if you try running it while a previous profile session is already running.

Also, if FIQProf allocated the log buffer from the RMA instead of a dynamic area, it should still be able to identify the location of the buffer and deallocate it when it is next run - the system variable FIQProf$RMABase will have been set to the base address of the buffer.

If you pass FIQProf the command line option -Kill, it will disable the FIQ vector and deallocate any buffers it can identify, and then exit without setting up the new profiling run.

Note that all command line parameters are case-sensitive.


How to use profanal
-------------------

profanal is a command-line tool that interacts with the user via a simple command interpreter. It is intended for use on machines with support for large wimpslots, as it will load the entire log buffer into the application space and does not make use of dynamic areas.

When you run profanal you can have it automatically load a log file by specifying the filename on the command line.

The command interpreter accepts a number of commands. These are as follows:

General commands
----------------

help             Display this help
script <file>    Load commands from <file>
*<command>       Run shell command
quit             Quit
exit             Quit

Data loading
------------

loadprof <filename>                    Load profiling data
loadgpa <modulename> <filename>        Load GPA file
loadabs <filename>                     Load Absolute file containing 'poked'
                                       function names
loadsyms <modulename> <filename>       Load symbols file
                                       (Norcroft -Symbols option)
loadrom <builddir> <romname>           Load module list and symbols from ROM
                                       build tree. E.g.
                                       'loadrom <Build$Dir> aUVZ00-00'
modules                                List modules
module <name> <start> [+ <len>|<end>]  Add/update module

Data analysis
-------------

info <addr>                            Display information about address
plist                                  Show available performance counters
dump <start> + <count> [max <lines>]   Dump samples, using configured filters
dump <start> <end> [max <lines>]
dump <start> max <lines>
dump [max <lines>]
graph <width> <start> + <count>        ASCII graph of CPU usage
graph <width> <start> <end>       
graph <width>                     
hist <start> + <count> [max <lines>]   Generate a histogram showing frequency
hist <start> <end> [max <lines>]       of occurrence of functions/modules
hist [max <lines>]
pgraph <ctr> <w>*<h> <start> + <count> ASCII graph of a performance counter,
pgraph <ctr> <w>*<h> <start> <end>     or of one counter divided by another
pgraph <ctr> <w>*<h>                   (see 'filter perf' for <ctr> syntax)
dumpto [<filename>]                    Redirect 'dump', 'graph', 'hist' &
                                       'pgraph' output to file

Data filtering
--------------

filter module *|l|f|+|- <modules>...   Change module detail level:
                                       *  Show all entries
                                       l  Group by GPA lines
                                       f  Group by GPA functions
                                       +  Group by module
                                       -  Hide module
filter module *|l|f|+|- *              Change detail level for all modules
filter unknown *|+|-                   Change 'Unknown' detail level:
                                       *  Show all unknown entries
                                       +  Group unknown entries together
                                       -  Hide all unknown entries
filter reg <hex regmask>               Set which registers to display
filter perf ...                        Specify performance counters for 'dump':
filter perf 0 1 2 [...]                Show counters #0, #1, #2
filter perf 1/0 2/0 [...]              Show counters #1 and #2 divided by #0
psr raw|simple                         Change PSR display mode


Note that currently the command interpretor is very primtive. All interactions are case-sensitive, and there is very little tolerance of whitespace.

The output of 'dump' is a trace of the CPU state. Each line of the output shows the sample #, the values of the registers that are enabled in the regmask filter, the performance counters enabled via 'filter perf', and (if the PC was saved in the profile output) information about what module is active. If GPA debug data has been loaded for the module, or function names have been loaded via loadabs, then the active file, function & line number is also displayed. The filtering options for modules and the 'unknown' module can be used to cut down the number of lines in the output. The 'max' parameter can be used to specify the maximum number of lines to output, to help with fine tuning the filter options.

Where the 'filter module' option has been used to hide some of the 'dump' output, the performance values shown for each dump line will be the sum of the counter values for all samples until the next sample which is output. For when one counter is being divided by another, this sum is performed before the division operation. Additionally, if one counter is being divided by another, but the value of the denominator is 0, the counter value will be shown as 'numerator/0'.

If the PSR is shown in 'simple' mode, only the processor mode, Thumb, IRQ and FIQ bits will be shown (although the FIQ bit should always be 0, since it's impossible for the sampling code to run if FIQs are disabled). In 'raw' mode, the raw hex value is displayed.

The output of 'graph' shows the CPU usage of each module. Each column of the graph is 1 character wide, so the numbers 0-9 are used to show the CPU usage in units of 10%. A cell will be blank if there is no CPU usage; 0 for anything between 0% and below 10%; 1 for between 10% and 20%; etc. 9 is used for anything between 90% and 100%. 100% is represnted with '*'.

'pgraph' represents each cell in an identical way to 'graph', i.e. a blank cell has 0 hits, a cell containing '0' has between 0% and 10% of hits for that column, etc. Note that 'pgraph' and 'filter pgraph' accept performance counter numbers as reported by 'plist' - these are not the same as the performance counter numbers that you specify to FIQProf!


Further notes and technical details
-----------------------------------

Modules - Before FIQProf starts the profiler, it enumerates the module list and adds it to the start of the log buffer. profanal then uses this list as the basis for its module list. Any changes to the module list after profiling begins will not be tracked.

Dynamic areas - FIQProf also places a list of dynamic areas into the log buffer.

WIMP tasks - There's currently no support for tracking the active WIMP task. FIQProf is therefore unsuitable if you want to profile WIMP applications.

Standard 'modules' - When profanal loads a log file, it also adds a selection of 'standard' modules. These include the processor vectors (which occasionally show up in profile samples), along with a 512MB block for application space (this value is correct for the current HAL memory map). It also adds a 64k 'HAL' module at address &FC000000, and renames the UtilityModule to Kernel and adjusts its start address to lie at &FC010000.  This allows Kernel GPA files to be easily loaded and used to profile the kernel.

Iyonix support - The current Iyonix HAL as used in RISC OS 5.16 doesn't support the mapping of the IOP's timers to FIQs. Therefore if an Iyonix is detected (by the presence of the NVidia module) then custom code is used to manually enable FIQs for the desired timer. This will almost certainly lead to a total system lockup if the FIQProf FIQ handler is overwritten before it is allowed to complete its job.


History
-------

Release 6 - 15/2/2020
- Fixed FIQProf to record the correct information for PMP DAs
- Fixed FIQProf to cope with recent Iyonix ROMs which use HAL_TimerIRQClear
- Fixed profanal to not lose track of the high processor vectors when 'loadrom' is used
- Changed FIQProf to construct a module containing the FIQ handler, so it can reclaim the FIQ vector if the OS (or other software) temporarily takes it over via Service_ClaimFIQ / Service_ReleaseFIQ (which can happen frequently with modern versions of RISC OS 5)
Release 5 - 1/9/2012
- Added support for 'loadsyms' and 'loadrom' commands
- Added support for HALs that use HAL_TimerIRQClear to clear timer interrupts (e.g. BCM2835)
- Added support for kernels that run with high processor vectors enabled
Release 4 - 12/2/2011
- Added support for recording and displaying the IOP321 & Cortex-A8 performance counters
- Updated FIQProf to save the dynamic area list
- Made the 'Application_space' module shrink itself to the appropriate size if Aemulor is active
- Old profile data will not be compatible with this version of FIQProf!
Release 3 - 28/9/2010
- Added loadabs function for loading absolute files with embedded function names
Release 2 - 4/10/2009
- Corrected loading of GPA files
Release 1 - 20/9/2009
- First version


Legal
-----

FIQProf and profanal are Public Domain software.


Contact info
------------

Jeffrey Lee
me@phlamethrower.co.uk
http://www.phlamethrower.co.uk/

